Segmenting Conversations by Topic, Initiative, and Style

نویسنده

  • Klaus Ries
چکیده

Topical segmentation is a basic tool for information access to audio records of meetings and other types of speech documents which may be fairly long and contain multiple topics. Standard segmentation algorithms are typically based on keywords, pitch contours or pauses. This work demonstrates that speaker initiative and style may be used as segmentation criteria as well. A probabilistic segmentation procedure is presented which allows the integration and modeling of these features in a clean framework with good results. Keyword based segmentation methods degrade significantly on our meeting database when speech recognizer transcripts are used instead of manual transcripts. Speaker initiative is an interesting feature since it delivers good segmentations and should be easy to obtain from the audio. Speech style variation at the beginning, middle and end of topics may also be exploited for topical segmentation and would not require the detection of rare keywords. ACM SIGIR’01 Workshop on Information Retrieval Techniques for Speech Applications New Orleans, Louisiana, September 13, 2001

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supervised Topic Segmentation of Email Conversations

We propose a graph-theoretic supervised topic segmentation model for email conversations which combines (i) lexical knowledge, (ii) conversational features, and (iii) topic features. We compare our results with the existing unsupervised models (i.e., LCSeg and LDA), and with their two extensions for email conversations (i.e., LCSeg+FQG and LDA+FQG) that not only use lexical information but also...

متن کامل

Compassionate Conversations

Staff engagement is much more than just a bonus in any organisation. CQC data shows that it is very clearly linked to positive results in both patient and staff outcomes (fewer complaints, improved safety, reduced sickness, fewer accidents, and more as per Michael West). Staff engagement may seem nebulous but is in fact measured routinely annually in the National Staff Survey. The problem is th...

متن کامل

F0 correlates of topic and subject in spontaneous Japanese speech

This paper examines F0 correlates of morphologically marked grammatical functions, in particular topic and subject, in spontaneous Japanese speech. Our data consist of F0 measurements of 7,106 nouns in the CallHome Japanese corpus of telephone conversations [4]. We find that topics exhibit higher peak F0 than subjects, contradicting information-structure accounts which predict that topics, whic...

متن کامل

A Hierarchical Bayesian Model for Topic Segmentation

Many streams of real-world data, such as conversations or body movements, consist of relatively coherent segments, each characterized by particular topics or controllers. Making sense of these data requires simultaneously segmenting the sequences and inferring the structure of the segments. We present a hierarchical Bayesian model that can be used to break a sequence of utterances or movements ...

متن کامل

An Initial Test Collection for Ranked Retrieval of SMS Conversations

This paper describes a test collection for evaluating systems that search English SMS (Short Message Service) conversations. The collection is built from about 120,000 text messages. Topic development involved identifying typical types of information needs, then generating topics of each type for which relevant content might be found in the collection. Relevance judgments were then made for gro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001